Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpGetFile fix heap corruption and optimize #44

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

bulk88
Copy link

@bulk88 bulk88 commented Oct 20, 2024

See commit details.

Another concern, why was this wonderful XSUB written in 2022 if its NOT USED anywhere else in perl/CPAN ecosystem and not used by blead perl/CPAN.pm/perl core?

https://grep.metacpan.org/search?q=HttpGetFile&qd=&qft=*.pm&qifl=

for ref/hotlinking

#30

-TODO list after this is applied

-$file arg can be instead a \$string to skip the disk file

-implement async/timeout/progress reporting functionality, only way to safely do it is basically, is have the user pass a Win32 signal name like "ZERO" or "NUM05" or "NUM24", and queue a safe-signal fire at that perl sub/PP signal, from the random thread pool thread. This is compatible with all perl XS-using Win32 GUI apps, and PP win32 sleep(), and will correctly interrupt PP crypto currency mining (100% CPU) from Perl_runops_standard().

When compiling blead perl 5.41 with GCC 8.3.0 i686, Win32.xs has a link
failure breaking blead.

C:/Strawberry/c/bin/../lib/gcc/i686-w64-mingw32/8.3.0/../../../../i686-w64-mingw
32/bin/ld.exe: C:\sources\perl5\cpan\Win32/Win32.xs:1880: undefined reference to
`WinHttpQueryHeaders@24'

Switch to loading winhttp.dll on demand with GetProcAddrees, not
static DLL linking.  Plus winhttp.dll has a huge tree of sub-dependency
DLLs to many other MS WinOS DLLs. Most Win32.pm users and most perl.exe
processes instances will never call Win32::HttpGetFile. Win32::HttpGetFile
is a great feature, but it will never have the usage demand of
Win32::GetLastError() for example, so load it on demand.

Fixing the GCC link failure is most important. This patch minimalistic to
get blead perl+GCC to compile. I see other cleanup that can be done but
this patch is minimalistic. Reference count the DLL Library handle for
ithread reasons and libperl unloads.
-sv_to_wstr_len(), if input is low ASCII clean, or basically UTF BMP clean
 just guess the initial WIDE buffer, and run through MultiByteToWideChar()
 only once for perfomance. Dont translate the same string twice. Non-BMP
 can't be guessed ahead.
-sv_to_wstr_len() handle error codes from MBTWC correctly, die() if UTF-8
 with UTF surrogates code points and other nasties, can't continue
 execution if there was no UTF16 output
-add shortcuts for empty string to code page converters


-dont use ST() over and over, nothing will realloc the PL stack
-dont keep Newx() mem blocks on C stack for long periods incase PL die()
-dont use Newx() for smallish length capped strings, C stack is better
 faster and leak proof
-original pull req says the proxy code is untested, but Safefree() is
 wrong on a MS native pointer, and instant "panic: free to wrong pool"
 if you try it, MS API docs say GlobalFree() is correct
-ProxyInfo.lpszProxy and friend, lift pointer, NULL it, then free it,
 incase struct ProxyInfo is used in future code refactors or its C scope
 lasts longer
-Perl-exception-proof DESTROY all winhttp.dll handles and the CreateFile()
 handle and WCHAR * file, with SVMG
-store LPCWSTR acceptTypes[] as const, its mis declared in MS API docs
-even though HGF is sync I/O, async I/O is TODO, spin perl's event loop
 with HGF_ASYNC_CHECK(); either %SIG{ALRM} can fire or some perl Win32
 GUI app will be more responsive than the current situation
-"ZeroMemory(&msgbuf, ONE_K_BUFSIZE * 2);" totally redundant, we aren't
 using PP

 my $str = "\x00" x 1024; pack('p',$str);

 here, 1 WIDE null char is enough for all APIs. Win32 API never determines
 output buffer length, by counting null bytes.
-CloseHandle() the file before DeleteFile(), perhaps lock vio was here b4.
-CloseHandle() before returning, mortal SVs get freed at PP ";"s, not at
 every token.
-wcsncpy(msgbuf", L"unable"), src is fixed length, use Move();
-GIMME_V is a getter function not a macro, don't call it multiple times
-don't ret any SV *s if caller will ignore them with G_VOID
-don't convert and alloc the extended WCHAR msgbuf if its empty string

This was tested with "$ENV{PERL_WIN32_INTERNET_OK} = 1;"
-preliminary delay loading of DLLs for ancient GCC Mingw.org,
 modern Mingw64 GCC, and MSVC, without using CC specific features. Notably
 old GCCs require a "dlltool" post 2010, to generate a .a from a .def.
 Strawberry 5.8.9's dlltool is too old. Plus politics by lead devs of
 Mingw64 and Mingw.org claiming delay loading is patented, or you can
 can fork ld, and ReactOS and WINE did fork ld, to add delay loaded
 DLLs as defined by MS ABI/PE spec. Strangely, MS API compliant
 __delayLoadHelper2 function, has been included for 15 years by both
 Mingw64 and Mingw.org in libmingwex.a or msvcrt.a, both projects
 won't explain why, and just reject all tickets and support requests on
 the topic. I have made working delay loading DLLs with GCC, but
 dlltool.exe distribution by 3rd party Mingw packagers is problematic.
 AFAIK both projects do not publish any gcc/ld Win32 binaries, so
 Strawberry Perl has always used unofficial 3rd party GCC binaries.

 The delay loading code here correctly ref counts DLLs on a
 psudofork/ithreads. Win32.pm static links too many DLLs most users will
 never call into. Infrastruture here allows more DLLs to be turned into
 delay loads beyond old CC old Perl requirements of delay load/runtime
 linking because of old CC .a/.lib files. Faster startup, less process
 unique memory with each DLL removed.

 w32ppport.h needs a code formatter run on it. Next step after this
 commit is winhttp.dll/HttpGetFile support, on 100% of Win32 Perls and
 CCs released in the last 20 years.

 Also keeping the fnc ptrs in MY_CXT vs true DLL/EXE global memory is for
 a future optimization. MY_CXT keeps some organization vs more complicated
 style of winhttp.dll fnc ptr storage, but a DllMain() and a
 CRITICAL_SECTION need to be added vs all these MY_CXT pointers and their
 indirection.

 Also certain groups of fnc ptrs must be fetch as a single unit, rather
 than the current 1 by 1 system.

 Using the offset into MY_CXT to find the initializer strings from the
 const struct, is highly efficient, and reduction arg counts at call
 sites for 1x ever executed branches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant